# install.packages('duckplyr')
# install.packages('geodata')
library(tidyverse) # for data manipulation
library(duckplyr) # for fast data processing
library(phytools) # for phylogenetic regression
library(lme4) # for linear models
library(rnaturalearth)
library(sf)
library(raster)Mapping Species Richness: Integrating Occurrence Data, Climatic Variables, and Phylogenetic Insights in a Global Grid Analysis - A minimal example
In this notebook, there is a minimal tutorial to a spatial analysis pipeline that maps species occurrence points to WorldClim variables and overlays them on a global grid to quantify species richness per cell. There is also examples about using GLM, GLMM, and phylogenetic regression, to examine how climate variation to estimate an ordinal measure of ant polymorphism.
Ant polymorphism, Phylogenetic regression, Global biodiversity, Ecological complexity, Quantitative ecology
1 Getting started
Before you start:
Make sure you have the latest version of R installed.
Open R in any IDE of your choosing (Rstudio, VScode, Jupyter, etc… )
Create an empty script (.R) or notebook (.Rmd, .Qmd)
Copy the code from this notebook and execute it in your local drive
Alternatively, clone the GitHub repository or download the notebook source code and open it in your drive
- make sure to have Quarto installed if you go this route. For Rstudio users, Quarto comes preinstalled, for VScode and others, you need to download the Quarto extension.
2 Dependencies
To replicate this tutorial, make sure you have the following packages. To install a package, use install.packages('package_name') (Note you need to do it only once)
3 Sourcing data
For this tutorial we will use the ant polymorphism database publised as part of the article: LaRichelliere et al., 2023. Warm regions of the world are hotspots of superorganism complexity
The dataset is open and public. You can download your own copy of the data by cloning the paper GitHub repository: https://github.com/lessardlab/GlobalPolyMorp
# Source data on global ant polymorphism.
my_ant_data <- duckplyr_df_from_csv("Lat-Long_Data_GABI.csv")
summary(my_ant_data)duckplyr: materializing
gabi_acc_number valid_species_name country dec_lat
Length:743211 Length:743211 Length:743211 Min. :-55.083
Class :character Class :character Class :character 1st Qu.: -8.367
Mode :character Mode :character Mode :character Median : 14.481
Mean : 14.816
3rd Qu.: 37.845
Max. : 88.416
dec_long elevation bentity2_name
Min. :-180.00 Min. : -80.0 Length:743211
1st Qu.: -85.02 1st Qu.: 130.0 Class :character
Median : -59.64 Median : 500.0 Mode :character
Mean : -19.46 Mean : 669.8
3rd Qu.: 34.87 3rd Qu.:1090.0
Max. : 179.97 Max. :5300.0
NA's :151 NA's :340610
head(my_ant_data)duckplyr: materializing
# A tibble: 6 × 7
gabi_acc_number valid_species_name country dec_lat dec_long elevation
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 GABI_01146836 Myrmecocystus.semirufus USA 36.5 -117. -80
2 GABI_01026649 Myrmecocystus.semirufus USA 36.5 -117. -80
3 GABI_01033314 Myrmecocystus.semirufus USA 36.5 -117. -80
4 GABI_01174725 Myrmecocystus.semirufus USA 36.5 -117. -80
5 GABI_01130976 Pogonomyrmex.californicus USA 36.5 -117. -80
6 GABI_01032969 Pogonomyrmex.californicus USA 36.5 -117. -80
# ℹ 1 more variable: bentity2_name <chr>
Let’s start with tidying the dataset. For instance, we can separate the Genus, species, and species name
my_ant_data <-
my_ant_data |>
mutate(Genus = str_extract(valid_species_name, "^([^.]+)"),
species = str_extract(valid_species_name, "([^.]+)$"),
species_name_no_dot = str_replace(valid_species_name, "\\.", " "))
my_ant_data |>
duckplyr::select(Genus, species, species_name_no_dot ) |>
DT::datatable()duckplyr: materializing